Optimizing I/O Costs of Multi-dimensional Queries Using Bitmap Indices

نویسندگان

  • Doron Rotem
  • Kurt Stockinger
  • Kesheng Wu
چکیده

Bitmap indices are efficient data structures for processing complex, multi-dimensional queries in data warehouse applications and scientific data analysis. For high-cardinality attributes, a common approach is to build bitmap indices with binning. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. In order to yield exact query answers, parts of the original data values have to be read from disk for checking against the query constraint. This process is referred to as candidate check and usually dominates the total query processing time. In this paper we study several strategies for optimizing the candidate check cost for multi-dimensional queries. We present an efficient candidate check algorithm based on attribute value distribution, query distribution as well as query selectivity with respect to each dimension. We also show that re-ordering the dimensions during query evaluation can be used to reduce I/O costs. We tested our algorithm on data with various attribute value distributions and query distributions. Our approach shows a significant improvement over traditional binning strategies for

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Optimal Multi-Dimensional Query Processing with Bitmap Indices

Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work efficiently. This paper studies strategies for minimizing the access costs for processing multi-dimensional queries using bitmap indices with binning. Innovative features of our algorithm include (a) optimall...

متن کامل

Array-Based Evaluation of Multi-Dimensional Queries in Object-Relational Databases Systems

Since multi-dimensional arrays are a natural data structure for supporting multi-dimensional queries, and object-relational database systems support multi-dimensional array ADTs, it is natural to ask if a multi-dimensional array-based ADT can be used to improve O/R DBMS performance on multi-dimensional queries. As an initial step toward answering this question, we have implemented a multi-dimen...

متن کامل

Array-Based Evaluation of Multi-Dimensional Queries in Object-Relational Database Systems

Since multi-dimensional arrays are a natural data structure for supporting multi-dimensional queries, and object-relational database systems support multi-dimensional array ADTs, it is natural to ask if a multi-dimensional array-based ADT can be used to improve O/R DBMS performance on multi-dimensional queries. As an initial step toward answering this question, we have implemented a multi-dimen...

متن کامل

Design and Implementation of Bitmap Indices for Scientific Data

Bitmap indices are efficient multi-dimensional index data structures for handling complex adhoc queries in readmostly environments. They have been implemented in several commercial database systems but are only well suited for discrete attribute values which are very common in typical business applications. However, many scientific applications usually operate on floating point numbers and cann...

متن کامل

Bitmap Indices for Fast End-User Physics Analysis in ROOT

Most physics analysis jobs involve multiple selection steps on the input data. These selection steps are called cuts or queries. A common strategy to implement these queries is to read all input data from files and then process the queries in memory. In many applications the number of variables used to define these queries is a relative small portion of the overall data set therefore reading al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005